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Abstract 



The case of adaptive testing under a multidimensional logistic response model is addressed. 
An adaptive algorithm is proposed that minimizes the (asymptotic) variance of the maximum- 
likelihood estimator of a linear combination of abilities of interest. The item selection 
criterion is a simple expression in closed form. In addition, it is shown how the algorithm can 
be adapted if the interest is in a test with a "simple information structure". The statistical 
properties of the adaptive ML estimator are demonstrated for a two-dimensional item pool 
with several linear combinations of the two abilities. 
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Multidimensional Adaptive Testing 
with a Minimum Error Variance Criterion 

Adaptive testing algorithms for item pools calibrated under a unidimensional item 
response theory (IRT) model have been well investigated (e.g., Lord, 1980; Wainer, 1990), 
and several large-scale testing programs are in the process of introducing adaptive testing as 
an alternative to traditional paper-and-pencil tests. Since these programs need large item pools 
to guarantee measurement precision, in particular if measures to balance test content and 
control item exposure are implemented, violations of the assumption of unidimensionality of 
the item pool can be expected. Study of algorithms for adaptive testing under a 
multidimensional model seems therefore a timely matter. 

The present paper is a sequel to van der Linden (1996) in which the problem of 
optimal assembly of a fixed test form from an item pool measuring multiple abilities is 
addressed. The emphasis in the work underlying this earlier paper was on an algorithm for 
assembling the fixed form to match optimally a set of targets for the (aymptotic) error 
variance functions for the abilities subject to a large variety of constraints on the composition 
of the test. It is the purpose of the present paper to study the use of the error variance as an 
item-selection criterion in tests with an adaptive format. Independent results on 
multidimensional adaptive testing are presented in Fan and Hsu (April, 1996) and Segall 
(1996). The interest in the former is in investigating the differences between item selection 
criteria based on various types of multivariate information measures rather than the error 
variance of the estimator(s). Also, these measures are evaluated over random sampling of 
correlated abilities. In the latter, the volumes of the confidence ellipsoid and the posterior 
credibility ellipsoid are proposed as multivariate item selection criteria. The posterior 
credibility ellipsoid is an attractive criterion because it allows for the possibility to build prior 
knowledge on dependencies between the ability variables into the item selection procedure. 
We will return to this point later in the paper. 

The paper is organized as follows: The following section introduces the 
multidimensional IRT logistic model used in the presentation of the algorithm and motivates a 
linear combination of abilities as the parameter of interest in multidimensional adaptive 
testing. The subsequent section discusses the (asymptotic) variance of the estimator of a linear 
combination of ability parameters. Then it is proposed to minimize the variance of this 
estimator as a criterion for multidimensional adaptive testing, and an adaptive algorithm 
minimizing the variance is presented. The algorithm involves expressions of the item 
parameters which are easy to evaluate. The last section demonstrates the use of the algorithm 
for a two-dimensional item pool and investigates the statistical properties of the adaptive 
estimator for various linear combinations of the two abilities. 
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Multidimensional Mode! 

Dichotomous response variables U* are used to denote the responses of an examinee to 
item i=l,...,n. The variables take the value 1 if the response is correct and the value 0 if it is 
incorrect. The model is the following multivariate logistic response function: 



Pi (6) 



Prob{Ui = ll0,ai,di} = 



exp(aj 'g-dQ 
1 + exp(aj 'g-dj) 



( 1 ) 



where g =(g, ,...,0j,... f 0 m ), with -oo<0 j <oo for j=l,...,m, is a vector of m ability variables, 
ai =(aii,...,ajj,...,ai m ), with aij>0 for j=l,...,m, is the vector of loadings of item i on these abilities 
(item discriminations), and -oo<di<oo is a scalar representing a linear combination of the 
difficulties of the item along the ability dimensions. Detailed information about the model is 
given in McKinley and Reckase (1983), Reckase (1985, 1996), and Samejima (1974). 

It is assumed that the item parameters ai and dj have already been estimated, and that 
the estimates are sufficiently accurate to consider them as the true parameter values. The 
parameters can be estimated using the Bayesian methods implemented in the program 
TESTFACT (Wilson, Wood, & Gibbons, 1984), or through McDonald’s (1996) harmonic 
analysis applied to a normal approximation to the logistic function implemented in the 
program NOHARM (Fraser & McDonald, 1988). 

Parameter of Interest 

. It is assumed that the parameter to be estimated by the adaptive testing procedure is a 
linear combination of the abilities, A 'g, where A=( Ai,...,A m ) is a vector of nonnegative 
weights. The choice of this parameter is motivated by the following practical cases: 

1. The item pool is intentionally designed to measure more than one ability. However, 
the consumers of the test scores only want a single number to be reported. An 
obvious example is an item pool for a test to predict a future criterion of success in a 
selection problem, where the criterion is multifaceted. In this case, the weights Aj are 

to reflect the relative importance of the individual abilities with respect to the 
criterion. 

2. The item pool is designed to measure only one ability but the items are sensitive to 
some ’’nuisance abilities’’ as well. A well-know example is a test for mathematical 
ability depending on verbal abilities required to understand the items. This case can 
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be dealt with by setting the weights Aj=0 for all nuisance abilities. As will become 
clear below, this measure does not neutralize the effect of the values of the nuisance 
parameters on the variance of the estimator of the intended ability but does allow for 
direct minimization of this variance. 

3. Even though the item pool measures several abilities, different sections of the test 
may be required to be maximally informative with respect different abilities, for 
example, because identifiable subsections of the tests are to be used for diagnosing 
individual abilities. As will be shown below, an adaptive test with a ’’simple ability 
structure” over the ability space can be realized by choosing different values for the 
weights Aj at different stages of the procedure. Note that this case is not equivalent 
to the one of choosing items from different unidimensional item pools; rather than 
rotating the selection of the items across unidimensional pools the weights Aj are 
rotated across different preselected values while selecting items from a 
multidimensional pool. 

For an extended description of the above cases of multidimensional testing using the format of 
a fixed-form test, see van der Linden (1996). 

Ability Estimation 

It is assumed that A 'Q is estimated by the method of MLE. As is well known, for 
MLE it holds that 



A '& = A 



( 2 ) 



For a response pattern (ui,...,u n ), the likelihood of Q is defined as 



L(0;u l ,...,u„,ai 



a„,di,...,dn) = n Prob{U, = U] l@, aj , dj } . 

i=] 



(3) 



The joint MLE of 6j , j=l,...,m, is the vector of values maximizing this likelihood. The 
likelihood equations are obtained setting the partial derivatives of the log of (3) equal to zero: 



d\nL 

c)0j 



£uilnPi(@) + (1 - ui)ln(l -Pi(@)) = 0, j=l,...,m, 

i=] 



( 4 ) 
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Using 



Jg = aijPj (0)[1 “ Pi (fi)], (5) 

the likelihood equations can be written as 

n 

Ia,[u,-R(©] = 0, (6) 

i=l 

which is the common form known to exist for a model belonging to the exponential family 
(Andersen, 1980, sect. 3.2). The system can be solved using Newton's method. 

go = e ci). [H(e (".))]- , r(e <-i) )i (7) 

where H(0 (|,|> ) is the Hessian of the log-likelihood function with elements 

^ = ' X 3ig ait. Pi (6)[ 1 • P, (6)] . g,h = l,...,m. (8) 

ao g at7h i=i 

and y(0 {M ) is the gradient of the log-likelihood function, i.e., the vector with the first 
derivatives set equal to zero in (6), both evaluated at step t-1. Substitution of the results from 
(7) into (2) gives the MLE of A '0 . 



Variance of A '0 

The asymptotic covariance matrix of the MLE of 0 is given by the inverse of Fisher's 
information matrix 



1(0) = (-E 



5 2 lnL(6;Ui,...,U n ,ai ,...,a„,di ,...,d„) 

d&oddt, 



), 



(9) 



with Lfi;Ui,...,U„,ai,...,a„,di,...,d„) being the likelihood statistics associated with the 
random response vector and 0 P and 8 q any two components of 0 . From (8), it follows that 
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^ 3 2 lnL(0;Ui,...,U n ,ai,...,a n ,di,...,d n ) 

dOgdOh 



n 



2 aj g a ih Pi (0)[ 1 ' Pi (0)] • 



( 10 ) 



Standard techniques for matrix inversion yield the (asymptotic) covariance matrix 



V = Var(£l0) = 1(e) 1 , 



( 11 ) 



where the determinant of the information matrix, 11(0)1, is assumed to be not vanishing. For 
the linear combination A 

Var( A '$10 ) = A 'VA. (12) 



For a model with two ability parameters, 0 = (0,,0 2 ), and (A] , Ao) = (A,l - A) , the 
result in (12) boils down to 

Var(A0, + (1-A)0 2 I0,,0 2 ) 

= A 2 Var($,l0, , 0 2 ) + (1 -A) 2 Var(0 2 l0,,0 2 ) + 2A(1 - A)Cov(0„tf 2 l0„0 2 ) 

= [A 2 2 a ?2 Pi (0i ,02>{ 1 - Pi (0i .^ 2 )} + (l-A) 2 ta?.P.(0,.02){l-Pi(0.,02)} 

i=l i=l 

+ 2 A( 1 - A) an aj 2 Pi (0. ,0 2 ){ 1 - Pi (0i , 62 ) } ]/II(0i .# 2 )! . (13) 

i=l 

where 



II(0.,02)I = [Ea?iP i (0.,02){l-Pi(0.,02))][Za? 2 P i (0 l ,02){l-Pi(0.,02)}] 



- [2a.ia, 2 Pi(0„0O{l-P.(0i,02))f. 

is] 



(14) 



Note that for n=2 the two items should not be parallel, i.e., it should not hold that 




an=a 2 i, 
3| 2=322, 
di=d 2 , 



( 15 ) 
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because then the determinant in (13) vanishes. 



Adaptive Testing Algorithm 



For notational convenience, the adaptive testing algorithm is also presented for the 
case of two ability parameters. The following definitions are needed: The items in the pool are 
indexed by i=l,...,I. The adaptive testing procedure is assumed to be stopped after n items 
have been administered. The order of the items in the test is indexed by k=l,...,n. Thus, ik is 
the index of the item in the pool administered as the kth item in the test. Suppose k-1 items 
have been selected. Let Sk={ii,...,ik-i } denote this set of items. Then, Rk={ l,...,I)\Sk is the set 

of items rejected so far, and item ik has to be selected from this set. Finally, let and be 
the estimators of 6 \ and 0 2 after k items have been administered. 

The kth item is selected according to the following criterion: 



min Rt {Var(A0f + (l-A)^!^' 1 ,^’ 1 )} . 



( 16 ) 



that is, the item is selected to minimize the variance of Atff + O-A)^ evaluated at the 
current estimates, which is (13) for (0,,0 2 ) = (0f \02 ’*) • 



To implement the criterion, define 



k-l 





( 17 ) 




( 18 ) 



k-l 





( 19 ) 



The criterion can thus be expressed in closed form as 
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ik = minj{[A 2 v j + (l-A) 2 u j -2A(l-A)w j ]/[u j v j -(w j ) 2 ];je R k } . 



( 20 ) 



In order to select the next item, for each item in R k one term is added to the sums in ( 1 7)-( 1 9). 
These term involves both the parameters an, a i2 , and d* and the probability Pj(0|,0 2 ), where 
the last quantity is evaluated at the current estimates 0 1 =^l c ' 1 and 0 2 = @ 2 ' • The item 
minimizing the expression in (20) is selected. 

Algorithm . The algorithm can be summarized as follows: 

1 . Choose a value for A reflecting the validity of the test; 

2. Select item ii and i 2 according to some external criterion; 

3. Estimate 6\ and 0 2 using the MLEs from (7); 

4. Enter the values of the parameters of item i\ and i 2 into ( 1 7)-(l 9); 

5. Evaluate ( 1 7)-( 1 9) for ie R 2 , and select the minimizer of (20); 

6. Repeat Steps 3-5 for k=3,...,n. 

In selecting the items in Step 2, the item parameters should be avoided to approximate 
the conditions in (15) because of instability in (20). As long as the examinee produces 
responses which are uniformly correct or incorrect, the MLEs of have to be bounded by 
well-chosen value. 

Simple Structure . As already noted, different sections of the test may be required to 
have good measurement qualities with respect to different abilities ("simple ability structure”). 
For two abilities, let ni and n 2 be the required numbers of items in the two sections. The best 
method to obtain error variances of 0, and @ 2 proportional to the ratio of ni to n 2 seems to set 

A equal to 1 ni times and to 0 n 2 times while alternating between the two sections from the 
beginning of the test seems. It should be noted, however, that with a multidimensional item 
pool the responses to each item contribute to the variance of 0, as well as 0 2 , and therefore 
both variances must be calculated over all ni+n 2 items in the test. Therefore, either variance 
may become more favorable than strictly required. 



A pool of 500 items was simulated drawing random values for the parameters a^ and 
aj 2 from U(0.0,1.3) and for dj from U(-1.3,1.3). The ranges of these distributions correspond 
roughly to the ranges of the parameter values in a two-dimensional ACT Assessment Program 



Numerical Examples 



ERIC 




BEST COPY AVAILABLE 



Multidimensional Adaptive Testing - 9 



Mathematics Item Pool used in van der Linden (1966) to study the performance of a linear 
programming model for assembling fixed-form tests. The adaptive algorithm was applied to 
simulated responses to a 50-item test of examinees with abilities on a two-dimensional grid 
defined by 0|,0 2 =-2.O, -1.8, ..., 2.0. Because the model in (1) permits MLE only when at least 
three responses are available, the full adaptive procedure was started only after responses to 
the first three items were simulated. The first two items were defined to have the same 
parameter values (an,a 2 i,di)=(l .2, 0.1, 0.0) and (0.1, 1.2, 0.0) for all examinees. The third item 
was selected from the pool applying (13) to simulated responses on the first two items. The 
log-likelihood function for the model in (1) is known to have an occasional unbounded 
maximum. In such cases, which happened predominantly for the combination of short test 
lengths and extreme values of (0/,0 2 )> the ability estimates were truncated at ±2. For each 
combination of ability values, 100 replications were produced. The study was repeated for 
A =0.250, 0.375, and 0.500 (larger values of A were omitted because of symmetry). 

Figure 1 shows the estimated bias and mean-squared error (MSE) of A$f+(1-A)$2 
as a function of 6 \ and 0 2 for k=10, 30, and 50 items and the different values of A. The 
dominant impression from these plots is that test length is a decisive factor but the choice of a 
value for weight A hardly has any effect on the behavior of the estimator. At 10 items the 
estimator has a unfavorable MSE for all values of A0, + (/ - A)0 2 . At the extremes, the large 
MSE is in part due to an intolerably large bias in the estimator. However, for 30 items the 
procedure seems to work reasonably well, and at 50 items both the MSE and the bias seem to 
be ignorable for all practical purposes. The fact that the results are robust with respect to A 
implies that from a statistical point of view the size of A is hardly important when setting up 
a program of multidimensional adaptive testing and that this factor can vary freely across 
applications without having any impact on the statistical properties of the ability estimator. 

Conclusions 

The main conclusion from this study is that use of the procedure in this paper seems 
practically feasible, provided the number of items in the test is set not too small. For short 
tests, the ML estimators of 0, and 0 2 are strongly biased and unstable, even when combined 
into a linear combination as in this paper. In this case, it seems better to resort to a Bayesian 
procedure as the one in Segall (1996). If empirical information about the correlation between 
the ability variables is available from external sources, a Bayesian procedure allows for the 
possibility of building this information into the (multivariate) prior distribution for the 
abilities. As a consequence, the adaptive estimator can be expected to stabilize quicker as a 
f unction of the test length. 



Multidimensional Adaptive Testing - 10 



Figure 1 . 

Bias and MSE functions of Xtf + (1- X)0 k 2 for A =0.250, 0.375, 0.500 (k= 10, 30, 50) 

l = .250 



k= 50 



k = 30 





k= 10 
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X=.375 



k= 50 



k = 30 
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A, = .500 



k = 50 



k = 30 



k = 10 
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Multidimensional IRT has a tradition of using a multivariate information measure as a 
tool for analyzing a test searching the ability space for directions in which information is 
maximal. These directions then define the ability composite which the test is assumed to 
measure best (Ackerman, 1994; Fan, 1996; Reckase & McKinley, 1991). The orientation in 
the current paper has been different. The composite was defined to be the parameter of 
interest. Next, the adaptive algorithm was used to have uniform measurement accuracy across 
the ability space. As demonstrated by the flat MSE functions for k=30 and 50 in Figure 1, it is 
possible indeed to have the same favorable measurement precision for all ability points and 
not only for those points that "define the composite". 
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